Skip to content

refactor(render): scheme owns bundle rendering [LTV-Pe]#108

Merged
shaypal5 merged 3 commits into
mainfrom
refactor/scheme-render-seam
Jun 10, 2026
Merged

refactor(render): scheme owns bundle rendering [LTV-Pe]#108
shaypal5 merged 3 commits into
mainfrom
refactor/scheme-render-seam

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

Completes the second half of the generation-scheme seam (LTV-Pe,
milestone LTV-M2) against the known-good lead-scoring path, so the
abstraction is fully formed before any file moves or lifecycle work. Pure
refactor — lead-scoring bundle output is byte-identical.

This reorders the roadmap on purpose: the render path is where schemes diverge
most (9-table vs 6-table, lead vs customer snapshot, classification vs
regression splits), so it's designed here against lead-scoring with
byte-identity as the oracle — not later, co-developed with an unfamiliar
lifecycle pipeline.

Changes

  • Protocol gains write_bundle(bundle, path, generation_timestamp).
  • LeadScoringScheme.write_bundle now owns the lead-scoring on-disk shape
    (relational export incl. the snapshot-safe projection, lead snapshot + task
    splits, dataset card, feature dictionary, exposure metadata, manifest),
    reusing the already-modular shared helpers (build_manifest,
    apply_exposure, get_filter).
  • api/bundle.py::write_bundle is now a thin dispatcher on
    bundle.spec.scheme; WorldBundle.save() delegates to the producing scheme.
    The historical write_bundle(bundle, path) entry point is preserved.
  • Registration footgun fixed (a finding from refactor(api): GenerationScheme protocol + registry [LTV-Pd] #107's review):
    get_scheme/available_schemes lazily trigger builtin registration, so
    resolution no longer depends on importing the leadforge.schemes package
    first — from leadforge.schemes.base import get_scheme now works on its own.

Roadmap reorder

LTV-Pe = render seam (this PR); LTV-Pf = physical module move (was Pe);
LTV-Pg = scaffold lifecycle (was Pf). Downstream codes shifted to
Pa..Ps (19 PRs). Same destination — the move now relocates a complete
scheme and bundle.py's call sites change only once.

Verification

✅ BYTE-IDENTICAL vs main (render seam) — 14 files
  • Tests: tests/schemes/test_render_dispatch.py (5) — save dispatch,
    unknown-scheme on write, unpopulated RuntimeError, render-path determinism,
    base-direct resolution (footgun guard, subprocess).
  • Full suite 1503 passed / 51 skipped (+5); ruff + mypy clean.
  • BUNDLE_SCHEMA_VERSION unchanged.

🤖 Generated with Claude Code

Completes the second half of the generation-scheme seam against the known-good
lead-scoring path, so the abstraction is fully formed before any file moves or
lifecycle work. Pure refactor — lead-scoring bundle output is byte-identical.

- GenerationScheme protocol gains write_bundle(bundle, path, generation_timestamp).
- LeadScoringScheme.write_bundle now owns the lead-scoring on-disk shape
  (9-table relational export incl. snapshot-safe projection, lead snapshot +
  task splits, dataset card, feature dictionary, exposure metadata, manifest),
  reusing the already-modular shared envelope helpers (build_manifest,
  apply_exposure, get_filter).
- api/bundle.py::write_bundle is now a thin dispatcher on bundle.spec.scheme;
  WorldBundle.save() delegates to the producing scheme. The historical
  write_bundle(bundle, path) entry point is preserved for existing callers.
- Registration footgun fixed: get_scheme/available_schemes lazily trigger
  builtin registration (_ensure_builtins), so resolution no longer depends on
  importing the leadforge.schemes package first — `from leadforge.schemes.base
  import get_scheme` now works on its own.

Roadmap reordered (docs/ltv/roadmap.md): the render seam (LTV-Pe) is sequenced
before the physical module move (now LTV-Pf), so the move relocates a complete,
both-halves scheme and bundle.py's call sites change only once. Downstream PR
codes shifted to Pa..Ps (19 PRs).

Verified byte-identical to main (14/14 files of a pinned-timestamp bundle).
Tests: tests/schemes/test_render_dispatch.py (5) — save dispatch, unknown-scheme
on write, unpopulated RuntimeError, render-path determinism, and base-direct
resolution (footgun guard, subprocess). Full suite 1503 passed / 51 skipped;
ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 10, 2026 09:55
@shaypal5 shaypal5 added this to the dataset: leadforge-ltv-v1 milestone Jun 10, 2026
@shaypal5 shaypal5 added type: refactor Code change with no behavior difference layer: render render/ bundle and artifact output layer: api api/ public Python surface status: needs review Ready for review dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream labels Jun 10, 2026
Check off LTV-Pe, link PR #108, advance status. Next: LTV-Pf (physical move).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors bundle rendering so that each generation scheme owns its on-disk bundle shape, with api/bundle.py becoming a thin dispatcher based on bundle.spec.scheme. This completes the “render half” of the GenerationScheme seam while aiming to keep lead-scoring bundle output byte-identical, and it hardens scheme resolution against import-order issues via lazy built-in registration.

Changes:

  • Add write_bundle(bundle, path, generation_timestamp) to the GenerationScheme protocol and implement it on LeadScoringScheme.
  • Make leadforge.api.bundle.write_bundle delegate to the resolved scheme instead of orchestrating rendering directly.
  • Ensure built-in scheme registration is triggered lazily on get_scheme() / available_schemes(); add tests for dispatch, determinism, and import-order independence.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
leadforge/schemes/base.py Extends the scheme protocol with write_bundle and adds lazy built-in registration on scheme resolution.
leadforge/schemes/lead_scoring/__init__.py Moves the lead-scoring bundle-writing orchestration into LeadScoringScheme.write_bundle.
leadforge/api/bundle.py Simplifies bundle writing into a scheme-based dispatch function.
tests/schemes/test_render_dispatch.py Adds coverage for save/write dispatch, unknown-scheme handling, determinism, and the import-order footgun fix.
docs/ltv/roadmap.md Updates LTV roadmap sequencing and PR codes to reflect the render-seam-first plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread leadforge/schemes/base.py
Comment on lines +112 to +118
global _builtins_loaded
if _builtins_loaded:
return
_builtins_loaded = True
import importlib

importlib.import_module("leadforge.schemes")
Comment on lines +238 to +245
(root / "dataset_card.md").write_text(
render_dataset_card(
bundle.spec,
task_manifest=task,
table_counts=table_row_counts,
features=visible_features,
)
)
…self-review) [LTV-Pe]

Hostile self-review of the first revision: it relocated the ~110-line
write_bundle monolith into LeadScoringScheme.write_bundle and the docstrings
over-claimed that a shared "envelope" was being reused, when in fact the write
orchestration was copied — lifecycle would duplicate ~60% of it.

Two honest corrections (no behaviour change; byte-identical preserved):
- Extract the one genuinely scheme-agnostic, safe-to-share step:
  `write_relational_tables(dfs, tables_dir, redacted) -> {name: rows}` in
  leadforge/render/relational.py (redaction-drop + parquet-write + row-count
  loop). LeadScoringScheme uses it; LifecycleScheme will too instead of copying.
- Stop over-claiming. Docstrings (api/bundle.py, base protocol,
  LeadScoringScheme.write_bundle) now state plainly that each scheme currently
  orchestrates its OWN full write; the deeper envelope/hook decomposition is
  deferred to LTV-M6 *because* build_manifest (takes world_graph) and
  apply_exposure (writes the lead-scoring hidden graph + latent registry) are
  still lead-scoring-coupled and the decomposition is best designed with a
  second scheme in hand.

Tests: tests/render/test_write_relational_tables.py (5) — per-table parquet +
row counts, nested-dir creation, redaction drop, no-redaction passthrough,
iteration-order preservation. Full suite 1508 passed / 51 skipped; ruff + mypy
clean; verified byte-identical to main (14/14 files).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

pr-agent-context report:

This run includes unresolved review comments on PR #108 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: leadforge/schemes/base.py:120
URL: https://github.com/leadforge-dev/leadforge/pull/108#discussion_r3387279835
Root author: copilot-pull-request-reviewer

Comment:
    `_ensure_builtins` sets `_builtins_loaded = True` before attempting the import. If `import_module('leadforge.schemes')` raises (or if multiple threads call this concurrently), the flag can be left `True` while built-ins are not actually registered, causing `get_scheme()` / `available_schemes()` to fail with `UnknownSchemeError` even though the package would be importable on retry. Set the flag only after a successful import so failures don’t poison the registry state.

## COPILOT-2
Location: leadforge/schemes/lead_scoring/__init__.py:249
URL: https://github.com/leadforge-dev/leadforge/pull/108#discussion_r3387279907
Root author: copilot-pull-request-reviewer

Comment:
    `Path.write_text(...)` defaults to the platform encoding. Since the bundle render path is meant to be byte-deterministic (and this file is part of that output), it’s safer to pin `encoding='utf-8'` (and optionally `newline='\n'`) to avoid locale-dependent bytes on systems where the default isn’t UTF-8.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 27273050428 attempt 1
Comment timestamp: 2026-06-10T11:28:38.678724+00:00
PR head commit: 1f5fe07f3a2f169e48da09ab22631586874bc33a

@shaypal5 shaypal5 merged commit 5c0cc53 into main Jun 10, 2026
9 of 10 checks passed
@shaypal5 shaypal5 deleted the refactor/scheme-render-seam branch June 10, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream layer: api api/ public Python surface layer: render render/ bundle and artifact output status: needs review Ready for review type: refactor Code change with no behavior difference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants